{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# F1 Score\n",
    "## 精准率和召回率在不同的应用背景下被关注得程度不同\n",
    "+ 股票预测(上升为1下降维0)时，我们更关注精准率，真实为0但是预测为1(FP)的结果很严重\n",
    "+ 癌症预测(患癌为1不患为0)时，我们更关注召回率，真实为1但是预测为0(FN)的结果很严重\n",
    "![癌症预测为例对比精准率和召回率](../02-精准率和召回率/images/癌症预测为例对比精准率和召回率.png)\n",
    "\n",
    "## 为了能兼顾精准率和召回率两个指标，我们引入了F1 Score\n",
    "> F1 Score是精准率precision和召回率recall的调和平均值\n",
    "\n",
    "![F1_Score的计算公式](images/F1_Score的计算公式.png)\n",
    "\n",
    "另一种形式为\n",
    "\n",
    "![F1_Score的计算公式2](images/F1_Score的计算公式2.png)\n",
    "\n",
    "两者的转化过程为\n",
    "\n",
    "![F1_Score的计算公式3](images/F1_Score的计算公式3.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 实现F1 Score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def f1_score(precision, recall):\n",
    "    try:\n",
    "        return 2 * precision * recall /(precision + recall)\n",
    "    except:\n",
    "        return 0.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "precision = 0.5\n",
    "recall = 0.5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.5"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "f1_score(precision, recall) # 精准率和召回率接近时，f1差不多为均值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "precision = 0.1 \n",
    "recall = 0.9"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.18000000000000002"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "f1_score(precision, recall)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "precision = 0.0\n",
    "recall = 1.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "f1_score(precision, recall) # 一个为0结果就为0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用传统的方法评价分类结果(即正确率)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn import datasets\n",
    "digits = datasets.load_digits() # 加载手写数字识别数据集\n",
    "X = digits.data\n",
    "y = digits.target.copy()\n",
    "# 多分类问题转换为二分类问题，即等于9和不等于9,数据比例大约是1:9,也就是说我们只要全认为是非9，按照传统计算正确率的方法我们也有90%的正确率\n",
    "y[digits.target==9] = 1\n",
    "y[digits.target!=9] = 0\n",
    "from sklearn.model_selection import train_test_split\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "D:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\linear_model\\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.\n",
      "  FutureWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0.9755555555555555"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "log_reg = LogisticRegression()\n",
    "log_reg.fit(X_train, y_train)\n",
    "log_reg.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用F1 Score评价分类结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_predict = log_reg.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[403,   2],\n",
       "       [  9,  36]], dtype=int64)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.metrics import confusion_matrix\n",
    "confusion_matrix(y_test, y_predict) # 混淆矩阵"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import precision_score\n",
    "pre_score = precision_score(y_test, y_predict) # 精准率"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import recall_score\n",
    "rec_score = recall_score(y_test, y_predict) # 召回率"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8674698795180723"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "f1_score(pre_score, rec_score) # f1_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用sklearn中的自带的F1 Score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8674698795180723"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.metrics import f1_score\n",
    "f1_score(y_test, y_predict) # 和上面自己计算地一样"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
